An Analysis on Removal of Duplicate Records using Different Types of Data Mining Techniques: A Survey

نویسنده

  • P. Selvi
چکیده

In the current period rapid improvement of information technology provides to the need of large volume of storage to storing the dataset. From different data mart, most of the data warehouse access ability of data, by reason of this there is a prospect of latency of high record duplicates. Uncounted systems are mainly troubled by the habitation of duplication in the database which provides to the problem like slow performance, degradation of data quality, waste of data storage and high operating cost. In enlargement assurance of duplicates provides to the issue of misleading, the system reports as fails to recover the proper data for the entanglement of query and the time complication is big. The above said issues can be concluding by the process of record deduplication which is the one of the necessary task in data preprocessing. This process concluded in data cleaning and replica free repositories which allow recovering increased higher quality information. Record Deduplication is the process of analyzing and removing records in data storage which indicate to the same entity of different sources of data. Record Deduplication is necessary while linking entity based datasets that permit or not permit to share a frequent accessory. This paper discusses about the elaborate introduction to data deduplication. In this paper also granted the comprehensive study of different existing techniques for removal of data replication using deduplication.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Method for Duplicate Detection Using Hierarchical Clustering of Records

Accuracy and validity of data are prerequisites of appropriate operations of any software system. Always there is possibility of occurring errors in data due to human and system faults. One of these errors is existence of duplicate records in data sources. Duplicate records refer to the same real world entity. There must be one of them in a data source, but for some reasons like aggregation of ...

متن کامل

Diagnosis of diabetes by using a data mining method based on native data

Background & Aim: Detecting the abnormal performance of diabetes and subsequently getting proper treatment can reduce the mortality associated with the disease. Also, timely diagnosis will result in irreversible complications for the patient. The aim of this study was to determine the status of diabetes mellitus using data mining techniques. Methods: This is an analytical study and its databas...

متن کامل

Using Data Mining Techniques for Intelligent Diagnosis of Severity of Depressive Disorder

Introduction: Implementing a method that can help individuals diagnose or prevent mental disorders can be an important step in preventing and controlling these disorders especially in the early stages. The objective of this research was to apply data mining techniques for intelligent diagnosis of severity of depressive disorder. Method: The present applied research was carried out by going to a...

متن کامل

Using Data Mining Techniques for Intelligent Diagnosis of Severity of Depressive Disorder

Introduction: Implementing a method that can help individuals diagnose or prevent mental disorders can be an important step in preventing and controlling these disorders especially in the early stages. The objective of this research was to apply data mining techniques for intelligent diagnosis of severity of depressive disorder. Method: The present applied research was carried out by going to a...

متن کامل

Experimental Studies, Response Surface Methodology and Molecular Modeling for Optimization and Mechanism Analysis of Methylene Blue Dye Removal by Different Clays

In this work, three types of natural clays including kaolinite, montmorillonite, and illite with different molecular structures, as adsorbents, are selected for the removal of methylene blue dye, and their performance is investigated. Also the optimization and the analysis of the dye adsorption mechanism are performed using the response surface methodology, molecular modeling, and experimental ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017